Scientific computing with non-standard floating point types

نویسنده

Gregg

چکیده

Faculty of Engineering, Mathematics and Science Department of Computer Science and Statistics Master of Science in Computer Science Scientific computing with non-standard floating point types May 2015 Author: Vlăduţ Mădălin Druţa Supervisor: Dr. David Gregg This study examined the possible use of non-standard floating point types for scientific computing. The question of this thesis is: “Is there anything to be gained by supporting non-standard floating point data types?”. There are several gaps in the literature that this thesis will aim to address. There could exist potential in the use of non-standard floating point types. This thesis investigates in particular the non-standard floating point type of 48-bit size. As long as there is no need for the full precision of floating point standard size of 64, the 48-bit non-standard type requires less memory, reduces the amount of data movement and might be faster than the standard size of 64-bit. The initial findings showed that the non-standard (f48-bit) without the use of Streaming SIMD (Single Instruction Multiple Data) Extensions (SSE) is slower than using the standard 64 bit floating point. However, using SSE intrinsics the non-standard 48-bit floating point is competitive with the standard 64-bit. The results shown are good for a floating-point type that is not supported in hardware.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimal design of fixed-point and floating-point arithmetic units for scientific applications

The challenge in designing a floating-point arithmetic co-processor/processor for scientific and engineering applications is to improve the performance, efficiency, and computational accuracy of the arithmetic unit. The arithmetic unit should efficiently support several mathematical functions corresponding to scientific and engineering computation demands. Moreover, the computations should be p...

متن کامل

Accurate Floating-Point Summation Part II: Sign, K-Fold Faithful and Rounding to Nearest

In this Part II of this paper we first refine the analysis of error-free vector transformations presented in Part I. Based on that we present an algorithm for calculating the rounded-to-nearest result of s := ∑ pi for a given vector of floatingpoint numbers pi, as well as algorithms for directed rounding. A special algorithm for computing the sign of s is given, also working for huge dimensions...

متن کامل

Floating Point Unit Generation and Evaluation for FPGAs

Floating point units form an important component of many reconfigurable computing applications. The creation of floating point units under a collection of area, latency, and throughput constraints is an important consideration for system designers. Given the range of possible tradeoffs, most commercial or academic floating point libraries for FPGAs provide a small fraction of possible floating ...

متن کامل

A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing

Advances in field programmable gate arrays (FPGAs), which are the platform of choice for reconfigurable computing, have made it possible to use FPGAs in increasingly many areas of computing, including complex scientific applications. These applications demand high performance and high-precision, floating-point arithmetic. Until now, most of the research has not focussed on compliance with IEEE ...

متن کامل

A Distillation Algorithm for Floating-Point Summation

The addition of two or more floating-point numbers is fundamental to numerical computations. This paper describes an efficient “distillation” style algorithm which produces a precise sum by exploiting the natural accuracy of compensated cancellation. The algorithm is applicable to all sets of data but is particularly appropriate for ill-conditioned data, where standard methods fail due to the a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Scientific computing with non-standard floating point types

نویسنده

چکیده

منابع مشابه

Optimal design of fixed-point and floating-point arithmetic units for scientific applications

Accurate Floating-Point Summation Part II: Sign, K-Fold Faithful and Rounding to Nearest

Floating Point Unit Generation and Evaluation for FPGAs

A Library of Parameterizable Floating-Point Cores for FPGAs and Their Application to Scientific Computing

A Distillation Algorithm for Floating-Point Summation

عنوان ژورنال:

اشتراک گذاری